Keyword Extraction and Semantic Tag Prediction
نویسندگان
چکیده
Content on the web is often organized through user generated tags for intuitive search and retrieval. Such tags convey meta-information about the subject matter of the texts they represent. For this project, we applied machine learning (Bayesian co-occurrence, k-NN, SVM, NNS) to predict tags of StackExchange posts obtained from Kaggle: “Facebook Recruiting Keyword Extraction III.” Using our non-parametric, fuzzy Nearest Neighbor Search algorithm, we achieved a F1-score of 0.471 on a testing set with unseen data and 0.773 on the Kaggle test set (containing many duplicate data points). Furthermore, when predicting a single tag per post, our algorithm attained approximately 71.1% accuracy (on unseen data), surpassing the 0.65 accuracy attained by Stanley & Byrne (2013). Our keyword-tag co-occurrence model and fuzzy NNS proved to be fast and practical for large-scale subject and tag prediction problems with tens-of-thousands of tags and training documents.
منابع مشابه
Some applications of a statistical tagger for Swedish
We will brie y describe a part-of-speech (POS) tagger for Swedish and discuss some applications: rule-based and probabilistic grammar checking, word prediction and keyword extraction. In POS tagging of a text, each word and punctuation mark in the text is assigned a morphosyntactic tag. We have designed and implemented a tagger based on a second order Hidden Markov Model [1]. Given a sequence o...
متن کاملOptimizing title and Meta tags based on distribution of keywords; Lexical and semantic approaches
Problem statement: To increase traffic on websites, Search Engine Optimization (SEO) has provided many costly and time-consuming options. One problem is the inadequate distribution of keywords especially those keywords that users use the title tag and Meta tags. Approach: This study described work on an initial model for handling some of the SEO factors to increase the distribution of keywords....
متن کاملMIKE: An Interactive Microblogging Keyword Extractor using Contextual Semantic Smoothing
Social media, such as tweets on Twitter and Short Message Service (SMS) messages on cellular networks, are short-length textual documents (short texts or microblog posts) exchanged among users on the Web and/or their mobile devices. Automatic keyword extraction from short texts can be applied in online applications such as tag recommendation and contextual advertising. In this paper we present ...
متن کاملExploring the Value of Folksonomies for Creating Semantic Metadata
Finding good keywords to describe resources is an on-going problem. Typically, we select such words manually from a thesaurus of terms, or they are created using automatic keyword extraction techniques. Folksonomies are an increasingly well-populated source of unstructured tags describing Web resources. This article explores the value of the folksonomy tags as a potential source of keyword meta...
متن کاملFinding User Semantics on the Web using Word Co-occurrence Information
With the currently growing interest in the Semantic Web, describing user semantics to model users and their social relationships is coming to play an important role. This paper proposes a novel keyword extraction method to extract user semantics from the Web. Based on co-occurrence information of words, the proposed method extracts relevant keywords depending on the context of a person. Our eva...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013